Variational ansatz for quasispecies in the Eigen model 
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We investigate the error threshold for the emergence of quasispecies in the Eigen model. By map- 
ping to an effective Hamiltonian ruled by the "imaginary-time" Schrodinger equation, a variational 
ansatz is proposed and applied to calculate various properties associated with the quasispecies. The 
variational ansatz gives correct prediction for the survival population of the wild-type sequence and 
also reveals an unexpected universal scaling behavior near the error threshold. We compare the 
results from the variational ansatz with that from numerical methods and find excellent agreement. 
Though the emergence of the scaling behavior is not yet fully understood, it is remarkable that the 
universal scaling function reigns even for relatively short genome length such as L — 16. Further in- 
vestigations may reveal the mechanism of the universal scaling and extract the essential ingredients 
for the emergence of the quasispecies in molecular evolution. 



I. INTRODUCTION 

Molecular perspective for biological evolution [IHS] 
stired up interesting ideas and attracted much attention 
in recent decades. In 1971, Eigen proposed a simple but 
profound model to study the process of molecular replica- 
tion and introduced the concept of quasispecies [2 which 
turned out to be crucial in understanding the fundamen- 
tal behavior of evolution. A quasispecies consists of a 
wild type with large fitness accompanied by a large num- 
ber of mutant types in sequence space [B]. The Eigen 
model refines our understanding of the evolutionary dy- 
namics: Selection moves the population towards the bet- 
ter adapted mutants and explores for even better ones 
in the sequence space by random mutations. The Eigen 
model is particularly suitable for viral evolution|U|S] and 
population genetics [TJ [7HS] and is also widely applied to 
general evolutionary theories. 

Error threshold[10, 11 appears as an upper bound on 
mutation rate, above which no effective selection can oc- 
cur and the quasispecies becomes unstable. It places lim- 
its on how genetic information can be maintained and 
passed on from generations to generations and restricts 
possible theories for the origins of life. However, it is 
quite puzzling that some RNS viruses [ITHT3] seem to 
have mutation rates close to the error threshold. The 
error threshold for the Eigen model was first derived by 
ignoring mutations of mutants back to the wild type, 
known as the assumption of no back mutation[8_. Later, 
the model was extensively studied by mapping to equiva- 
lent statistical models fTH - TTB] and the existence of the er- 
ror threshold can be viewed as a phase transition from the 
ordered phase (quasispecies) to the disordered phase (no 
effective selection). Exact solution of the model [TUH2T| 
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shows that the error threshold is identical to that derived 
under the assumption of no back mutation. In addition, 
the phase transition is shown to be first-order [22 . 

Though the exact solution [T9"H2T| is available for the 
Eigen model, due to technical complications, it is not yet 
clear why the error threshold derived within the assump- 
tion of no back mutation is identical to the exact solu- 
tion. Meanwhile, it is desirable to devise a mean-field 
approach capturing the essence of the phase transition 
so that the order parameter can be computed (and un- 
derstood) in the easiest way. Inspired by the fact that 
this is a first-order phase transition, it is possible for one 
to come up with a variational ansatz capturing the phase 
transition. In this article, we make a full use of the sym- 
metry and propose a simple variational ansatz contain- 
ing only two configurations in the sequence space. It 
is rather remarkable that the variational approach cap- 
tures the essence of the phase transition including the 
error threshold and also the evolution of the wild-type 
population. It is quite surprising that error thresholds for 
different genome lengths, values of mutation rate and rel- 
ative fitness collapse onto a universal curve, giving credit 
to the power of the variational ansatz. We also perform 
numerical calculations to verify the results obtained by 
the variational ansatz and find nice agreement between 
them. 

This article is organized as follows. In Sec. [IT] we briefly 
introduce the Eigen model and a gauge transformation 
to an equivalent Hamiltonian with imaginary-time dy- 
namics. In Sec. |III[ the Hamiltonian can be solved by 
the proposed variation ansatz, giving the error threshold 
and also the evolution of the wild-type population. In 
Sec. |IV| we calculate the back-mutation rate to the wild 
type and confirm that the error threshold remains the 
same within the approximation of no back mutation. In 
Sec. ^ we present a numerical approach to the Eigen 
model and compare the results with the variational ap- 
proach. 
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II. EIGEN MODEL 

Consider a DNA chain with genome length L. The 
primary structure at each site of the chain takes on four 
different nucleotides (G, A, C and U). For simplicity, 
we choose to distinguish only among purines (R) and 
pyrimidines (Y). Thus, the total size of sequence space 
is reduced to 2 L , yet still huge for numerical simulations. 
A viral genome with typical length L ~ 1000 will lead 
to a (reduced) sequence space of the size ~ 10 300 . For 
more complex forms of life, the size of the sequence space 
increases tremendously and methods developed in statis- 
tical mechanics seem appropriate and powerful to address 
the dynamics of the genome evolution. 

The time-dependent relative population of a particu- 
lar sequence i is denoted as Xi(t). The selection is on 
the genotype level described by the fitness fi for each 
sequence. In general, the fitness function can be rather 
complicated and evolves with time. Here it is assumed to 
take on its time-averaged value and treated as a constant. 
Eigen proposed a deterministic dynamics to describe the 
relative populations of the sequences in the presence of 
mutations, 

= ( m ij h ~ <t> 5 a ) x i ( 1 ) 

3 

where Xi and /, are the relative population and the fit- 
ness of the sequence i, while <j> = J2i x ih is the average 
fitness for all sequences. The relative populations are 
normalized with Xi = 1 initially and remain so un- 
der the time evolution of the quasispecies equation. The 
mutation matrix rriij describes the probability to mutate 
from sequence j to sequence i in a generation and satis- 
fies 'Ylii m i] = 1- If only point mutations are considered, 
it is straightforward to work out the mutation matrix 

my =u^(l-u) L - d », (2) 

where u is the probability for a single point mutation 
to occur within a generation and dy is the Hamming 
distance [23] between the sequences. Since dij — dji, it is 
clear that the mutation matrix my is also symmetric. 

The quasispecies equation can be transformed into an 
imaginary-time Schrodinger equation by the gauge trans- 
formation, 

Mt) = ^(t)e F(t) , (3) 

where W(t) — 4>(t). Applying the gauge transformation 
to the quasispecies equation, the dynamics of ^(t) is 
captured by an effective Hamiltonian ffy, 

' H ' lt ' Y."'^r Wlth "■■ \ /'/..'"'/• ( 4 ) 
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One notices that the above equation can be obtained from 
the usual Schrodinger equation by replacing the time 



variable t —> it (and setting h = 1). Following the stan- 
dard textbooks, the general solution for the "imaginary- 
time" Schrodinger equation is 

n 

where E n and are the eigenvalues and eigenvectors 
of the effective Hamiltonian _£/y. Note that the factor 
y/fi m the gauge transformation is crucial to make the 
Hamiltonian symmetric. 

Unlike the intrinsic oscillatory nature of the usual 
Schrodinger equation, its imaginary-time version only 
keeps the ground state in the infinite-time limit, 

lim *,(t) — > c ^e- Eot , (6) 

t— 5-00 

and greatly simplifies the calculations. Making use of 
the normalization condition, J^. = 1, the exponential 
factor in the gauge transformation can be expressed as 

e w (t)= J2 * %{t ). (7) 

The inverse gauge transformation can thus be found, 

x t {t) = -^=Mt)e- W(t) 

V Ji 

In the infinite-time limit, only the ground-state wave 
function survives and the relative populations are 

x* = lim xAt) = I 1 — | 4^$?. (9) 

Note that the constant cq cancels itself out and does not 
appear in x*. To find the survival populations x*, one 
only needs to find the ground state $° of the effective 
Hamiltonian. 



III. VARIATIONAL ANSATZ 

In Eigen's original proposal, the wild-type sequence, 
say, has maximum fitness /o = ju and all other 
sequences i have a common background fitness 
fi = f < Jm- Even though the single-peak fitness land- 
scape is simple, the size of the sequence space is still 
huge N s = 2 L . Since the fitness landscape is obviously 
symmetric around the wild- type peak, it is convenient to 
introduce the representation for arbitrary configurations, 

* = (^°,v 1 ,-> i ), (10) 

where -0 d is a row vector representing states with Ham- 
ming distance d to the peak and thus has the dimension 
of = L\/d\(L-d)\. 
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Figure 1: The survival population xq of the wild- type se- 
quence plotted against the point mutation rate u and the 
rescaled variable g = uL/\nr. The fitness landscape consists 
of a single sharp peak with relative fitness r — fia/f = 2 to 
a uniform background fitness / for all other sequences. The 
genome length starts from L = 10 (red) to L — 22 (purple). 
It is clear that the crossover (from the localized quasispecies 
to the extended state without selection) becomes sharper as 
the genome length increases. For given relative fitness r, all 
curves for Xq (it, L) can be collapsed onto the universal scaling 
function by changing the variable to g — uL/lnr. It is clear 
that the mutation threshold is g c — 1. 



Since we are interested in the low-energy sector of the 
effective Hamiltonian, it is sufficient to keep the so-called 
"s-wave" sector|21 only. That is to say, out of the many 
states with the same Hamming distance to the wild-type 
sequence, one only needs to keep the totally symmetric 
state for each Hamming distance, 



= (V> ,0,..,0) = (1,0,0,..., 0), 



(11) 



(0,i/>\0,...,0) 



-(0,1, 1, ..,1,0,0, ..0), (12) 



and other $! ns can be constructed in a similar way. Writ- 
ing down the effective Hamiltonian explicitly, 



L—dj 



(13) 



one can construct the reduced (L + 1) x (L + 1) Hamil- 
tonian in the s-wave sector 



Hi, 



(14) 



and numerically diagonalize the Hamiltonian to obtain 
the ground state. The s-wave symmetry helps tremen- 
dously and reduces the size of the relevant sequence space 
from N s = 2 L to just (L + 1). The comparison between 
the s-wave decomposition and the exact solution will be 
discussed in Section fVl 

Though the s-wave decomposition largely reduces the 
effective dimension of the sequence space, it is still too 
complicated for analytic manipulations. Here we choose 
two symmetric base functions, 

*° = (1,0,..., 0), V 1 = — L— (0,1, !,...,!), 
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(15) 

and use variational approach to estimate the error thresh- 
old for the Eigen model. Construct the variational wave 
function for the ground state 

* 4 = Co *° + Cl ^, (16) 

so that the variational energy takes the form 



E(c ll c 2 , A) 



£ 

n,m— 0,1 



c H v c 



•A(cg 



(17) 



where the Lagrange multiplier arises from the normaliza- 
tion constraint Cq + c\ = 1 and the variational Hamil- 
tonian H^ m = Ylij ^"Hij^J 1 is a 2 x 2 real symmetric 
matrix, 



#00 



-f M (l-u) L , 
f 



(18) 



\L — di 




u 1 ' 



TTV 



N„ - 1 L V 



(19) 



(20) 



Minimizing the variational energy _E(ci,C2,A) is equiva- 
lent to diagonalizing the 2x2 variational Hamiltonian 



Hnmi which can be rewritten as 



Hnm — CI + / 



-e 
-A 



-A 



(21) 



where the constant term does not affect the frequencies 
Xi{t) after the inverse gauge transformation, 



C 



/a/TOqo + / 



/ 



(1 - m 00 ) 



(22) 



with moo = (1 — u) L and will be dropped in the following 
calculations. The dimensionless parameters e and A in 
the variational Hamiltonian are 



e(u, L, r) 
A{u,L,r) 



1 



[rm oa 



N„ - 1 



1] 
[1- 



1 - m 0Q 



2(N S 
m 00 \ , 



1) 



(23) 
(24) 



where r — Jm //is the relative fitness of the dominant 
sequence and N s = 2 L is the size of the sequence space. 
The eigenvalues of the variational Hamiltonian are 



£ 4 



(25) 



Therefore, the ground-state energy within the variational 
approximation corresponds to the negative eigenvalue 
E = E- with the eigenvector 



£0 

Cl 



Ve 2 + A 2 - e 



Ve 2 + A 2 + e 
A ' 



(26) 



With the coefficients cq and c\ at hand, the ground 
state within the variational approximation is 



Co, 



Cl 



Cl 



Cl 



1 



1 



(27) 



The relative population of the wild-type sequence in the 
infinite-time limit can be computed from the variational 
ground state, 



Xo(u,L,r) 




Ve 2 + A 2 + e + y/r{2 L - 1)A ' 



(28) 



As shown in Fig. 1, the survival population of the 
wild-type sequence Xq is plotted against the probability 
of point mutation u for different genome lengths rang- 
ing from L = 10 to L — 22 for the relative fitness 
r = f M /f = 2. 

The error threshold, where the quasispecies disappears, 
marks the phase transition and, in theory, is only well 
defined in the thermal dynamic limit with infinite genome 
length. Although there is no extra numerical cost to 
choose a longer genome length L within the variational 
approach, we intentionally choose relatively short Ls to 
highlight the sharpness of the phase transition at finite 
genome lengths. Furthermore, for chosen relative fitness 
r, the survival populations Xq(u,L) for different u and 
L can be collapsed onto a universal curve when plotted 
with the rescaled variable 
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uL 
lnr 



(29) 



As is evident in Fig. 1, the collapse is almost perfect 
even for such short genome lengths. The error threshold 
occurs at g c — 1, i.e., u c L = ln(/jvf//) which agrees with 
the exact solution. 

The universal curves for the survival population 
Xq((?, r) can be derived by taking the genome length L to 
infinity. Taking L — > 00 but keeping g and r finite, the 
stay-on probability is moo = (1 — U ) L ~> r§ - Since the 
off-diagonal term A is negligibly small in this limit, the 
survival population of the wild-type sequence is greatly 
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Figure 2: The universal scaling function Xg(g,r) in the limit 
of infinite genome length. The transition is sharp and the 
universal scaling function varies with relative fitness from r = 
1 (red) to r = 13 (purple). 



simplified, 

Xo(g,r) = lim Xg(u,r,L) 



I [Ir 1 ^ - 1| + (r 1 - 9 - 1) 



9 - 1| + (r^s - 1)] + r(l - r-3) 



(30) 



Since the relative fitness r is greater than unity, the nu- 
merator is zero for g > 1 and the scaling function takes 
the simple form, 



x* (g,r) = e(l-g) 



-1-9 _ I 

r-1 ' 



(31) 



Magically, this form is identical to that derived under 
the assumption of no back mutations to the wild-type se- 
quence. The universal scaling behavior within the varia- 
tional approach is robust, hinting that the simple varia- 
tional ansatz captures the essence of the phase transition. 



IV. NO BACK MUTATION? 

The variational ansatz also provides a helping hand 
to checking how good the assumption of no back mu- 
tations is. A good indicator is the ratio between the 
back-mutating rate and the rate to stay on the wild-type 
sequence, 



x(u,L,r) 



(32) 



Within the variational approximation, all x* = x\ for 
j ^ and the summation in the numerator can be carried 
out without difficulty, 



x(u,L,r) 



Xofhimoo 
/ 1 - TO 00 



Cl 



Jm m oa y/(N s - 1) 



co 



(33) 
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Figure 3: The back-mutation ratio x{ u , L, r) plotted against 
the mutation rate u and the rescaled variable g = uL/lnr. 
The fitness landscape consists of a single sharp peak with 
relative fitness r = fu / f — 2 to a uniform background fitness 
/ for all other sequences. The genome lengths range from 
L = 10 (red) to L — 22 (blue). For large enough genome 
length, the back-mutation rate is indeed negligibly small when 
compared with the rate to stay on the dominant sequence. 
However, beyond the mutation threshold, the ratio x grows 
exponentially with the disappearance of quasispecies. 

For r = 2, the back-mutation indicator \ is shown in Fig. 
3. When the quasispecies exists, the back-mutation rate 
is exponentially small even for a finite genome length. 
Therefore, the no-back- mutation assumption is justified 
in this regime. On the other hand, when the mutation 
probability u exceeds the error threshold, the indicator 
X grows exponentially and shows that the back-mutation 
rate dominates the rate to stay on the wild-type sequence. 
The dynamics is dominated by mutual mutations be- 
tween different sequences and no selection is at work. 
In this regime, the assumption of no back mutations is 
inappropriate and the variational ansatz delivers a bet- 
ter description. Again, when plotted against the rescaled 
variable g = uL/ In r. it is shown in Fig. 3 that all curves 
collapse onto a universal one with the threshold g c — 1. 

Similarly, by taking the genome length to infinity, the 
scaling function for %(<?, r) can be derived. Note that the 
L — y oo limit is taken while keeping both g and r finite. 
The only tricky term in x(<?) r ) i s 

lim ; 1 = 6( 5 - 1)-=- r. (34) 

Thus, the scaling function for the back- mutation indica- 
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Figure 4: The eigenvalues of the (L + 1) x (L + 1) re- 
duced matrix f/„ m and the 2 L x 2 L full growth matrix G 
(dj = rriijfj — 1) at L = 10 plotted against the scaled mu- 
tation parameter g — uL/lnr. We see an exact match of the 
eigenvalues of these two matrices and some additional eigen- 
values of the full matrix not matched by those of the reduced 
one, that are likely corresponding to non-symmetric modes of 
decay. 

tor in the infinite genome length is 

X (g,r) = e(g-1) (r^-l). (35) 

The universal function x(<?i r ) is zero in the presence of 
the quasispecies, i.e., ignoring the back mutations to the 
wild- type sequence is justified. 

V. NUMERICAL VERIFICATION 

To check the validity of the variational ansatz, we also 
solve the Eigen model numerically. We start with Eigen's 
proposal and study the evolution of populations Xi(t) 
(not relative populations) in discrete generations, 

X i {t + l)=Y,m ij f j X i {t), (36) 

3 

where the mutation matrix and the fitness function 
fi are defined previously. It is convenient to introduce 
the growth matrix G denned as 

Gij = rriijfj - Sij- (37) 

Positive eigenvalues of the growth matrix imply popu- 
lation growth and negative ones mean diminishing pop- 
ulations. The stationary state then corresponds to the 
eigenstate of G with the largest eigenvalue. 

To check the validity of the s-wave decomposition, we 
compute the eigenvalues of G in the projected sequence 
space and compare the results to the exact diagonaliza- 
tion in the full space. As shown in Fig. 4, the s-wave de- 
composition works rather well. Near the error threshold 
g c = 1, there are only two symmetric states coming close 
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Figure 5: Survival population of the wild-type sequence xo 
plotted against the rescaled variable g in the Crow-Kimura 
model (solid lines) and the discrete Eigen model (dashed 
lines). For genome length L — 16, the survival population 
from the discrete Eigen model is well described by the univer- 
sal scaling function in Fig. 2 and approaches that obtained 
in the Crow-Kimura model when the relative fitness r ap- 
proaches unity. 
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Figure 6: The survival population of the wild-type sequence 
xo, varying from L — 4 (red) to L = 16 (blue) in the continu- 
ous limit (it — > and r — > 1 + ). Note that the scaling function 
with infinite genome length in the continuous limit is just the 
linear function x^{g) = l — g below the error threshold g c — 1. 



to each other. A detail check reveals that these are just 
the variational wave functions we chose before. There- 
fore, the numerical results lend hands to the support of 
our variational ansatz. Although the numerical results 
presented here are for evolution in discrete time steps, 
one can prove (not shown here) that the stationary state 
of the quasispecies equation gives identical configuration. 

It is interesting to go beyond the Eigen model and see 
whether the variational ansatz still works. It is known in 
the literature, by taking the evolutionary time step to the 
continuous limit, the discrete Eigen model becomes the 
Crow-Kimura model, described by the set of differential 
equations, 



dXj 
dt 



G{Xi 



7 , 1 - - v • 



(38) 



where G t = G + 8 i0 (GM 
i-th sequence and Ey is 
elements E 



= r for dij = 



G) is the grow rate for the 
the mutation-rate matrix with 
1, r,,- = —TL, and zero oth- 



erwise. The rescaled variable g is of crucial importance 
in comparing results from different approaches. When 
taking the continuous limit, the relations between u 
(Eigen) and Gi, T (Crow-Kimura) are 



fi = l + GiAt, 



TAt, 



(39) 



where At is the microscopic time step before taking con- 
tinuous limit. The rescaled variable g thus takes a slightly 
different form, 



uL 



TLAt 



HfM/f) 

TL 

Gm — G 



lim — ; „ 

At^o ln(l + G M Ai) 



ln(l + GAt) 
(40) 



Plotted against the rescaled variable g, the survival popu- 
lations of the wild-type sequence for discrete Eigen model 
and the continuous Crow-Kimura model are shown in 
Fig. 5. For discrete Eigen model with genome length 
L = 16, the survival population xq is very close to the 
universal scaling function in Fig. 2. As the relative fit- 
ness r = /m// approaches unity, the curve moves closer 
to that obtained from the Crow-Kimura model. The re- 
sults from the Crow-Kimura model at shorter genome 
lengths are also plotted for later comparison. 

To compare the stationary states in the quasispecies 
equation and the Crow-Kimura model, we need to take 
r — > 1 limit in the variational ansatz. 



e c {g,L) = lim 



&c{g,L) = lim - 

r— >1 r 



l r — 1 
A 
^1 



1-9 



fJ 



2 L -1 



1 



(41) 
(42) 



After some algebra, the survival frequency of the domi- 
nant sequence Xg(g, L) in the continuous limit is 



vg+A|+ec 
v/ £ 2 + A2 + e c 4 
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(43) 



These curves for finite genome length L are plotted in 
Fig. 6 and agree rather well with those obtained from 
Crow-Kimura model in Fig. 5. Furthermore, the above 
expression becomes extremely simple when the genome 
length goes to infinity, 



lim x* {g,r) 

r— >1 



9(1 -g) lim 



= 6(l- 5 ) x (l-g). 



1 



(44) 



It is quite remarkable that the universal scaling function 
with infinite genome length in the continuous limit is just 
a straight line. 

The numerical results from the discrete Eigen model 
and the Crow-Kimura model agree very well with the 



7 



variational ansatz we proposed for the quasispecies equa- 
tion. The success of the variational approach relies on the 
fact that there are only two active states near the phase 
transition (error threshold), verified by the exact diag- 
onalization. Although the exact solution for the Eigen 
model has been obtained in the literature, our variational 
ansatz provides a simpler approach at the cost of negli- 
gible errors. The no-back- mutation assumption is exam- 
ined and only remains justified in the presence of the 
quasispecies. The universal scaling behavior near the er- 
ror threshold was unexpected and further confirmed the 
validity of the variational approach. Though the emer- 



gence of the scaling behavior is not yet fully understood, 
it is remarkable that the universal scaling function reigns 
even for relatively short genome length such as L = 16. 
Further investigations may reveal the mechanism of the 
universal scaling and extract the essential ingredients for 
the emergence of the quasispecies in molecular evolution. 
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